Evolutionary Instance Resampling for Difficult Data Sets

نویسندگان

Khaled Rasheed

Walter D. Potter

Prashant Doshi

Maureen Grasso

William Dale Richardson

چکیده

In the field of machine learning, data set features such as across-class imbalance and class overlap often pose difficulties for classifier algorithms. A number of methods alleviate these difficulties by adjusting the distribution of the data set before classifier construction. Resampling is typically effected by re-weighting, removing, or duplicating instances. Finding a good distribution for the data set, however, is a nontrivial problem. Evolutionary algorithms are frequently used to search for solutions in large, difficult search spaces. In this thesis, four evolutionary approaches are applied to the problem of instance resampling across a variety of data sets and classifier paradigms. In many cases, the evolutionary pre-processing methods are able to produce better classifiers. In particular, an integer-based, one-to-one representation and a cluster-based, real-valued weighting scheme are shown to be beneficial for improving classifier performance on difficult data sets. Index words: genetic algorithms, machine learning, imbalance, undersampling, oversampling, instance selection Evolutionary Instance Resampling for Difficult Data Sets

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evolutionary rule-based systems for imbalanced data sets

This paper investigates the capabilities of evolutionary online rule-based systems, also called Learning Classifier Systems (LCSs), for extracting knowledge from imbalanced data. While some learners may suffer from class imbalances and instances sparsely distributed around the feature space, we show that LCSs are flexible methods that can be adapted to detect such cases and find suitable models...

متن کامل

An Improved Algorithm for SVMs Classification of Imbalanced Data Sets

Support Vector Machines (SVMs) have strong theoretical foundations and excellent empirical success in many pattern recognition and data mining applications. However, when induced by imbalanced training sets, where the examples of the target class (minority) are outnumbered by the examples of the non-target class (majority), the performance of SVM classifier is not so successful. In medical diag...

متن کامل

Credit Card Fraud Detection using Data mining and Statistical Methods

Due to today’s advancement in technology and businesses, fraud detection has become a critical component of financial transactions. Considering vast amounts of data in large datasets, it becomes more difficult to detect fraud transactions manually. In this research, we propose a combined method using both data mining and statistical tasks, utilizing feature selection, resampling and cost-...

متن کامل

A Study on the Combination of Evolutionary Algorithms and Stratified Strategies for Training Set Selection in Data Mining

Evolutionary algorithms are adaptive methods based on natural evolution that may be used for search and optimization. As Training Set Selection can be viewed as a search problem, it could be solved using evolutionary algorithms. In this paper, we have carried out an empirical study of the performance of CHC as representative evolutionary algorithm model. This study includes a comparison between...

متن کامل

Time-stamped resampling for robust evolutionary portfolio optimization

Traditional mean-variance financial portfolio optimization is based on two sets of parameters, estimates for the asset returns and the variance-covariance matrix. The allocations resulting from both traditional methods and heuristics are very dependent on these values. Given the unreliability of these forecasts, the expected risk and return for the portfolios in the efficient frontier often dif...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2013

Evolutionary Instance Resampling for Difficult Data Sets

نویسندگان

چکیده

منابع مشابه

Evolutionary rule-based systems for imbalanced data sets

An Improved Algorithm for SVMs Classification of Imbalanced Data Sets

Credit Card Fraud Detection using Data mining and Statistical Methods

A Study on the Combination of Evolutionary Algorithms and Stratified Strategies for Training Set Selection in Data Mining

Time-stamped resampling for robust evolutionary portfolio optimization

عنوان ژورنال:

اشتراک گذاری